Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Add cache keyword to to_datetime (#11665) #17077

Merged
merged 25 commits into from
Nov 11, 2017

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jul 26, 2017

Added a cache keyword to to_datetime to speedup parsing of duplicate dates:

Some notes:

  1. I defaulted cache=False i.e. don't use a cache to parse the dates. Should the default be True?

  2. I used pd.unique() to identify unique dates, and the current implementation did not accept a tuple of strings (objects). I added tuple_to_object_array and patched _ensure_arraylike to fix this.

  3. There is currently an included test that fails due to a case when using to_datetime(..., utc=True) with a Series. I am inclined to believe In[5] should have been the existing behavior. Thoughts?

In [2]: test_dates = ['20130101 00:00:00'] * 10

In [3]: s = pd.Series(test_dates)

# Same as existing behavior
In [4]: pd.to_datetime(s, utc=True, cache_datetime=False)
Out[4]: 
0   2013-01-01
1   2013-01-01
2   2013-01-01
3   2013-01-01
4   2013-01-01
5   2013-01-01
6   2013-01-01
7   2013-01-01
8   2013-01-01
9   2013-01-01
dtype: datetime64[ns]

In [5]: pd.to_datetime(s, utc=True, cache_datetime=True)
Out[5]: 
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
3   2013-01-01 00:00:00+00:00
4   2013-01-01 00:00:00+00:00
5   2013-01-01 00:00:00+00:00
6   2013-01-01 00:00:00+00:00
7   2013-01-01 00:00:00+00:00
8   2013-01-01 00:00:00+00:00
9   2013-01-01 00:00:00+00:00
dtype: datetime64[ns, UTC]

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

Let me address 1 and 3 first: IIUC, this caching should only speed up performance. The results you get with or without caching should not be different. The fact that you are getting different results indicates something is going weird with the caching.

In light of that reasoning, I would expect that caching should default to True.

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

For 2, I think that's worthy enough as its own PR. I'm not sure I fully agree with the patch (i.e. why do we need another special-case to_object_array method), but even so, you should separate that out as another PR and submit that for review (along with tests).

After that gets merged, you can rebase this PR onto that one.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need asv!

bench the cartesian product of several dtypes (int with unit, str, str with format, dti) and sizes (1000, 100000)
and all dupes, uniques

I suspect under a certain size cache doesn't matter/help, need to see where that is.

@@ -375,6 +375,23 @@ cpdef ndarray[object] list_to_object_array(list obj):

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef ndarray[object] tuple_to_object_array(tuple obj):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this, you are dupliating lots of code. I suspect you are hitting

In [2]: pd.unique(('foo'),)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-bd2fadf86f43> in <module>()
----> 1 pd.unique(('foo'),)

/Users/jreback/pandas/pandas/core/algorithms.py in unique(values)
    348     """
    349 
--> 350     values = _ensure_arraylike(values)
    351 
    352     # categorical is a fast-path

/Users/jreback/pandas/pandas/core/algorithms.py in _ensure_arraylike(values)
    171         inferred = lib.infer_dtype(values)
    172         if inferred in ['mixed', 'string', 'unicode']:
--> 173             values = lib.list_to_object_array(values)
    174         else:
    175             values = np.asarray(values)

TypeError: Argument 'obj' has incorrect type (expected list, got str)

if so, pls do a separate PR for this (it won't involve any cython code, just a simple change).

@@ -170,7 +170,10 @@ def _ensure_arraylike(values):
ABCIndexClass, ABCSeries)):
inferred = lib.infer_dtype(values)
if inferred in ['mixed', 'string', 'unicode']:
values = lib.list_to_object_array(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -183,7 +184,8 @@ def _guess_datetime_format_for_array(arr, **kwargs):

def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
utc=None, box=True, format=None, exact=True,
unit=None, infer_datetime_format=False, origin='unix'):
unit=None, infer_datetime_format=False, origin='unix',
cache_datetime=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_cache=True, change name and default to True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call this cache=True?

@@ -257,6 +259,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

.. versionadded: 0.20.0

cache_datetime : boolean, default False
If True, use a cache of unique, converted dates to apply the datetime
conversion. Produces signficant speed-ups when parsing duplicate dates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, add versionadded tag

@@ -340,6 +346,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

tz = 'utc' if utc else None

cache = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think can be moved inside _convert_list_like (maybe)

@@ -340,6 +346,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,

tz = 'utc' if utc else None

cache = None
if (cache_datetime and is_list_like(arg) and
not isinstance(arg, DatetimeIndex)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you write an asv (see below), this needs a min len arg (maybe 1000)

# No need to convert with a cache if the arg is already a DatetimeIndex
unique_dates = pd.unique(arg)
if len(unique_dates) != len(arg):
cache = {d: pd.to_datetime(d, errors=errors, dayfirst=dayfirst,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very inefficient, you are converting element by-element, simply

Series(to_datetime(unique_dates.....), index=unique_dates)

also avoids iterating over things

result = arg.map(cache)
else:
values = _convert_listlike(arg._values, False, format)
result = pd.Series(values, index=arg.index, name=arg.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave the imports alone

@@ -1,5 +1,6 @@
from datetime import datetime, timedelta, time
import numpy as np
import pandas as pd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use algorithms.unique below

@@ -306,6 +306,45 @@ def test_to_datetime_tz_psycopg2(self):
dtype='datetime64[ns, UTC]')
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize("box", [True, False])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is ok, but rather I would like to see a parametrize on pretty much every other test function in this file. that tests both use_cache=True/False

@mroeschke
Copy link
Member Author

mroeschke commented Jul 26, 2017

@gfyoung In regards to my 3rd point, I agree that the results should be the same whether cache_datetime=True/False. I am just curious whether pd.to_datetime(Series(...), utc=True) should return a dtype: datetime64[ns, UTC] in the first place.

In [9]: pd.__version__
Out[9]: u'0.20.3'

In [10]: test_dates = ['20130101 00:00:00'] * 10

In [11]: s = pd.Series(test_dates)

# Should this result have a datetime64[ns, UTC] dtype like Out [13]?
In [12]: pd.to_datetime(s, utc=True)
Out[12]:
0   2013-01-01
1   2013-01-01
2   2013-01-01
3   2013-01-01
4   2013-01-01
5   2013-01-01
6   2013-01-01
7   2013-01-01
8   2013-01-01
9   2013-01-01
dtype: datetime64[ns]

In [13]: pd.to_datetime(test_dates, utc=True)
Out[13]:
DatetimeIndex(['2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01',
               '2013-01-01', '2013-01-01', '2013-01-01', '2013-01-01',
               '2013-01-01', '2013-01-01'],
              dtype='datetime64[ns, UTC]', freq=None)

# I think this is essentially the result of my caching implementation. 
In [14]: pd.Series([pd.Timestamp('20130101 00:00:00', tz='utc')]*10)
Out[14]:
0   2013-01-01 00:00:00+00:00
1   2013-01-01 00:00:00+00:00
2   2013-01-01 00:00:00+00:00
3   2013-01-01 00:00:00+00:00
4   2013-01-01 00:00:00+00:00
5   2013-01-01 00:00:00+00:00
6   2013-01-01 00:00:00+00:00
7   2013-01-01 00:00:00+00:00
8   2013-01-01 00:00:00+00:00
9   2013-01-01 00:00:00+00:00
dtype: datetime64[ns, UTC]

This discussion may be more appropriate in a separate issue if it is one. I may not hit this though once I refactor this implementation.

@gfyoung
Copy link
Member

gfyoung commented Jul 26, 2017

I am just curious whether pd.to_datetime(Series(...), utc=True) should return a dtype: datetime64[ns, UTC] in the first place.

I think it should. datetime64 is standard, and you specified utc=True.

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

there is another issue about utc=True

so it's out of scope for this PR

@mroeschke
Copy link
Member Author

Thanks for confirming @jreback and @gfyoung. I will work on implementing your suggestions.

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

can you rebase / update

@pep8speaks
Copy link

pep8speaks commented Sep 26, 2017

Hello @mroeschke! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 11, 2017 at 18:39 Hours UTC

@mroeschke
Copy link
Member Author

mroeschke commented Sep 26, 2017

Here are the results from timeseries.ToDatetime ASVs.

>>>asv continuous -f 1.1 upstream/master fix_11665 -b timeseries.ToDatetime
      before           after         ratio
     [e0fe5cc6]       [72e99da0]
+     7.91±0.04ms           16.5ms     2.08  timeseries.ToDatetime.time_iso8601_format_no_sep
+     8.07±0.03ms      16.5±0.04ms     2.04  timeseries.ToDatetime.time_iso8601_nosep
+     8.18±0.01ms      16.4±0.03ms     2.00  timeseries.ToDatetime.time_iso8601
+     8.34±0.02ms      16.5±0.01ms     1.98  timeseries.ToDatetime.time_iso8601_format
+     12.5±0.07ms      14.8±0.05ms     1.18  timeseries.ToDatetime.time_format_YYYYMMDD
-        3.49±0ms         2.98±0ms     0.86  timeseries.ToDatetime.time_cache_with_dup_string_dates_and_format
-     3.54±0.01ms      2.98±0.01ms     0.84  timeseries.ToDatetime.time_cache_with_dup_string_dates
-     8.96±0.03ms      7.12±0.01ms     0.79  timeseries.ToDatetime.time_cache_with_dup_seconds_and_unit
-           2.11s       44.1±0.2ms     0.02  timeseries.ToDatetime.time_format_exact
-           2.02s      30.5±0.04ms     0.02  timeseries.ToDatetime.time_format_no_exact
-       407±0.3ms      3.35±0.01ms     0.01  timeseries.ToDatetime.time_cache_with_dup_string_tzoffset_dates

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Addtionally, I edited most all the tests in pandas/tests/indexes/datetimes/test_tools.py to include @pytest.mark.parametrize on the cache keyword (True and False)

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17077 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17077      +/-   ##
==========================================
- Coverage   91.25%   91.22%   -0.04%     
==========================================
  Files         163      163              
  Lines       49810    49829      +19     
==========================================
- Hits        45456    45455       -1     
- Misses       4354     4374      +20
Flag Coverage Δ
#multiple 89.02% <100%> (-0.02%) ⬇️
#single 40.32% <50%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.53% <ø> (ø) ⬆️
pandas/core/tools/datetimes.py 86.23% <100%> (+0.98%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5279a17...75eccc5. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 26, 2017

Codecov Report

Merging #17077 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17077      +/-   ##
==========================================
- Coverage   91.42%   91.39%   -0.04%     
==========================================
  Files         163      163              
  Lines       50064    50091      +27     
==========================================
+ Hits        45773    45779       +6     
- Misses       4291     4312      +21
Flag Coverage Δ
#multiple 89.2% <100%> (-0.02%) ⬇️
#single 40.36% <53.12%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/tools/datetimes.py 84.48% <100%> (+1.51%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3493aba...07fa22d. Read the comment docs.


.. versionadded: 0.20.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.21.0

if len(unique_dates) != len(arg):
from pandas import Series
cache_dates = _convert_listlike(unique_dates, False, format)
convert_cache = Series(cache_dates, index=unique_dates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so its better to actually to make convert_cache a function, which can then take the unique_dates and return the converted data, avoids lots of code duplication.

@@ -334,7 +334,7 @@ def __new__(cls, data=None,
if not (is_datetime64_dtype(data) or is_datetimetz(data) or
is_integer_dtype(data)):
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)
yearfirst=yearfirst, cache=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason for this chanege?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in order to prevent a RuntimeError due to recursion.

The cache is built using _convert_listlike which can return a DatetimeIndex, then the DatetimeIndex constructor can call to_datetime which goes back to _convert_listlike...

@mroeschke mroeschke force-pushed the fix_11665 branch 4 times, most recently from 0d5ac41 to 68c7e9f Compare October 4, 2017 02:22
@mroeschke
Copy link
Member Author

Tests are passing now and here are the latest asv results:

>>> asv continuous -f 1.1 upstream/master fix_11665 -b timeseries.ToDatetime
 before           after         ratio
     [37860a5f]       [bdff633d]
+     8.05±0.05ms      16.3±0.09ms     2.02  timeseries.ToDatetime.time_iso8601
+     7.99±0.06ms      15.8±0.09ms     1.98  timeseries.ToDatetime.time_iso8601_nosep
+     8.53±0.06ms       16.5±0.3ms     1.94  timeseries.ToDatetime.time_iso8601_format_no_sep
+     8.72±0.02ms      16.0±0.04ms     1.84  timeseries.ToDatetime.time_iso8601_format
+      12.5±0.1ms      14.6±0.08ms     1.17  timeseries.ToDatetime.time_format_YYYYMMDD
-        3.53±0ms         3.14±0ms     0.89  timeseries.ToDatetime.time_cache_with_dup_string_dates_and_format
-     8.18±0.03ms         7.25±0ms     0.89  timeseries.ToDatetime.time_cache_with_dup_seconds_and_unit
-           2.07s      47.7±0.02ms     0.02  timeseries.ToDatetime.time_format_exact
-           1.96s      31.7±0.02ms     0.02  timeseries.ToDatetime.time_format_no_exact
-       399±0.5ms      3.51±0.01ms     0.01  timeseries.ToDatetime.time_cache_with_dup_string_tzoffset_dates

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@mroeschke mroeschke changed the title [WIP] PERF: Add cache_datetime keyword to to_datetime (#11665) PERF: Add cache keyword to to_datetime (#11665) Oct 5, 2017
@@ -111,7 +112,11 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
origin.

.. versionadded: 0.20.0
cache : boolean, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is True

@@ -305,6 +310,29 @@ def _convert_listlike(arg, box, format, name=None, tz=tz):
except (ValueError, TypeError):
raise e

def _maybe_convert_cache(arg, cache, tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a proper doc-string here

result = _maybe_convert_cache(arg, cache, tz)
if result is None:
result = _convert_listlike(arg, box, format, name=arg.name)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't you handle these caess list-list/index) inside _maybe_convert_cache? (I am talking about the else/box part.

@jreback
Copy link
Contributor

jreback commented Oct 5, 2017

what is the 2x slowdown on some of the existing tests?

@chris-b1
Copy link
Contributor

chris-b1 commented Oct 5, 2017

It looks like the iso8601 path is so fast the caching always hurts - I think that makes sense, that conversion shouldn't be much more expensive than hashing, but not sure how to handle it.

@mroeschke
Copy link
Member Author

The 2x slowdown occurred with iso8601 strings dates without any duplicates. I've attached the profile below of the benchmark with the largest slowdown. It looks like it's expensive put the strings (objects) through algorithms.unique and then continue down the regular conversion path since there are no duplicates.

In [3]: rng = date_range(start='1/1/2000', periods=20000, freq='H')

In [4]: strings = rng.strftime('%Y-%m-%d %H:%M:%S').tolist()

In [5]: cProfile.run('to_datetime(strings)', sort='cumtime')
         273 function calls (270 primitive calls) in 0.017 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.017    0.017 <string>:1(<module>)
        1    0.001    0.001    0.017    0.017 datetimes.py:39(to_datetime)
        1    0.000    0.000    0.009    0.009 datetimes.py:207(_convert_listlike)
        1    0.008    0.008    0.008    0.008 {pandas._libs.tslib.array_to_datetime}
        1    0.000    0.000    0.007    0.007 datetimes.py:313(_maybe_convert_cache)
        1    0.000    0.000    0.007    0.007 algorithms.py:276(unique)
        1    0.005    0.005    0.005    0.005 {method 'unique' of 'pandas._libs.hashtable.PyObjectHashTable' objects}
        1    0.000    0.000    0.001    0.001 algorithms.py:164(_ensure_arraylike)
        2    0.001    0.001    0.001    0.001 {pandas._libs.lib.infer_dtype}
        1    0.000    0.000    0.001    0.001 algorithms.py:132(_reconstruct_data)
        2    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
        1    0.000    0.000    0.001    0.001 algorithms.py:189(_get_hashtable_algo)
        1    0.000    0.000    0.000    0.000 {method 'astype' of 'numpy.ndarray' objects}
      114    0.000    0.000    0.000    0.000 {isinstance}
    11/10    0.000    0.000    0.000    0.000 common.py:1773(_get_dtype_type)
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.list_to_object_array}
      2/1    0.000    0.000    0.000    0.000 _decorators.py:86(wrapper)
        1    0.000    0.000    0.000    0.000 datetimes.py:269(__new__)
        3    0.000    0.000    0.000    0.000 common.py:1545(is_bool_dtype)
        1    0.000    0.000    0.000    0.000 algorithms.py:39(_ensure_data)
        6    0.000    0.000    0.000    0.000 dtypes.py:85(is_dtype)
        3    0.000    0.000    0.000    0.000 common.py:334(is_datetime64tz_dtype)
        3    0.000    0.000    0.000    0.000 common.py:297(is_datetime64_dtype)
        1    0.000    0.000    0.000    0.000 datetimes.py:579(_simple_new)
       12    0.000    0.000    0.000    0.000 generic.py:7(_check)
        3    0.000    0.000    0.000    0.000 common.py:478(is_categorical_dtype)
        1    0.000    0.000    0.000    0.000 common.py:1049(is_datetime64_ns_dtype)
        2    0.000    0.000    0.000    0.000 dtypes.py:428(construct_from_string)
        1    0.000    0.000    0.000    0.000 common.py:1722(_get_dtype)
        3    0.000    0.000    0.000    0.000 common.py:85(is_object_dtype)
       18    0.000    0.000    0.000    0.000 {getattr}
        2    0.000    0.000    0.000    0.000 common.py:409(is_period_dtype)
        3    0.000    0.000    0.000    0.000 abc.py:128(__instancecheck__)
        2    0.000    0.000    0.000    0.000 inference.py:234(is_list_like)
        2    0.000    0.000    0.000    0.000 dtypes.py:370(__new__)
        2    0.000    0.000    0.000    0.000 common.py:873(is_unsigned_integer_dtype)
        2    0.000    0.000    0.000    0.000 dtypes.py:554(is_dtype)
        2    0.000    0.000    0.000    0.000 common.py:824(is_signed_integer_dtype)
        2    0.000    0.000    0.000    0.000 common.py:1493(is_float_dtype)
        8    0.000    0.000    0.000    0.000 {hasattr}
       11    0.000    0.000    0.000    0.000 {issubclass}
      6/5    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 common.py:442(is_interval_dtype)
        1    0.000    0.000    0.000    0.000 dtypes.py:676(is_dtype)
        4    0.000    0.000    0.000    0.000 _weakrefset.py:70(__contains__)
        5    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 numeric.py:463(asarray)
        2    0.000    0.000    0.000    0.000 dtypes.py:270(construct_from_string)
        2    0.000    0.000    0.000    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
        4    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
        1    0.000    0.000    0.000    0.000 base.py:558(__len__)
        1    0.000    0.000    0.000    0.000 base.py:552(_reset_identity)
        1    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x7f88fb67a4c0}
        1    0.000    0.000    0.000    0.000 {pandas._libs.tslibs.timezones.maybe_get_tz}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 base.py:467(_deepcopy_if_needed)
        1    0.000    0.000    0.000    0.000 frequencies.py:391(to_offset)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

@jorisvandenbossche
Copy link
Member

Given the potential slowdown, can't we add it as optional keyword? (so False by default)

@jreback jreback merged commit b36dab5 into pandas-dev:master Nov 11, 2017
@jreback
Copy link
Contributor

jreback commented Nov 11, 2017

thanks @mroeschke nice PR!

look forward to cache='infer' !

pls open an issue for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: add datetime caching kw in to_datetime
7 participants